Discuss this article in the forums
John Smith is in trouble! He is a Topcoder member and once he learned to master the “Force” of dynamic programming, he began solving problem after problem. But his once obedient computer acts quite unfriendly today. Following his usual morning ritual, John woke up at 10 AM, had a cup of coffee and went to solve a problem before breakfast. Something didn’t seem right from the beginning, but based on his vast newly acquired experience, he wrote the algorithm in a flash. Tired of allocating matrices morning after morning, the computer complained: “Segmentation fault!”. Despite his empty stomach, John has a brilliant idea and gets rid of his beloved matrix by adding an extra “for cycle”. But the computer cries again: "Time limit exceeded!"
Instead of going nuts, John makes a radical decision. Enough programming, he says! He decides to take a vacation as a reward for his hard work.
Being a very energetic guy, John wants to have the time of his life! With so many things to do, it is unfortunately impossible for him to enjoy them all. So, as soon as he eats his breakfast, he devises a “Fun Plan” in which he describes a schedule of his upcoming activities:
ID | Scheduled Activity | Time Span |
---|---|---|
1 | Debug the room | Monday, 10:00 PM - Tuesday, 1:00 AM |
2 | Enjoy a trip to Hawaii | Tuesday, 6:00 AM - Saturday, 10:00 PM |
3 | Win the Chess Championship | Tuesday, 11:00 AM - Tuesday, 9:00 PM |
4 | Attend the Rock Concert | Tuesday, 7:00 PM - Tuesday, 11:00 PM |
5 | Win the Starcraft Tournament | Wednesday, 3:00 PM - Thursday, 3:00 PM |
6 | Have some paintball fun | Thursday, 10:00 AM - Thursday, 4:00 PM |
7 | Participate in the Topcoder Single Round Match | Saturday, 12:00 PM - Saturday, 2:00 PM |
8 | Take a shower | Saturday, 8:30 PM - Saturday 8:45 PM |
9 | Organize a Slumber Party | Saturday, 9:00 PM - Sunday, 6:00 AM |
10 | Participate in an “All you can eat” and “All you can drink” challenge | Saturday, 9:01 PM - Saturday, 11:59 PM |
He now wishes to take advantage of as many as he can. Such careful planning requires some cleverness, but his mind has gone on vacation too. This is John Smith’s problem and he needs our help.
Could we help him have a nice holiday? Maybe we can! But let’s make an assumption first. As John is a meticulous programmer, once he agrees on something, he sticks to the plan. So, individual activities may either be chosen or not. For each of the two choices regarding the first activity, we can make another two choices regarding the second. After a short analysis, we find out that we have 2 ^ N possible choices, in our case 1024. Then, we can check each one individually to see whether it abides the time restrictions or not. From these, finding the choice with the most activities selected should be trivial. There are quite a lot of alternatives, so John would need to enlist the help of his tired computer. But what happens if we have 50 activities? Even with the most powerful computer in the world, handling this situation would literally take years. So, this approach is clearly not feasible.
Let’s simply the problem and trust our basic instinct for a moment. A good approach may be to take the chance as the first opportunity arises. That is, if we have two activities we can follow and they clash, we choose the one that starts earlier in order to save some time. In this case John will start his first evening by debugging his room. Early the next morning, he has a plane to catch. It is less than a day, and he has already started the second activity. This is great! Actually, the best choice for now. But what happens next? Spending 5 days in Hawaii is time consuming and by Saturday evening, he will still have only two activities performed. Think of all the activities he could have done during this five day span! Although very fast and simple, this approach is unfortunately not accurate.
We still don’t want to check for every possible solution, so let’s try another trick. Committing to such a time intensive activity like the exotic trip to Hawaii can simply be avoided by selecting first the activity which takes the least amount of time and then continuing this process for the remaining activities that are compatible with those already selected. According to the previous schedule, first of all we choose the shower. With only 15 minutes consumed, this is by far the best local choice. What we would like to know is whether we can still keep this “local best” as the other compatible activities are being selected. John’s schedule will look like this:
Take a shower (15 minutes)
Participate in the Topcoder Single Round Match (2 hours)
Participate in an “All you can eat” and “All you can drink” challenge (2 hours 58 minutes)
Debug the room (3 hours)
Attend the Rock Concert (4 hours)
Have some paintball fun (6 hours)
Out of the 10 possible activities, we were able to select 6 (which is not so bad). We now run the slow but trustworthy algorithm to see if this is actually the best choice we can make. And the answer is indeed 6. John is very appreciative for our help, but once he returns from the holiday, confident in our ingenious approach, he may face a serious problem:
By going for the short date, he misses both the school exam and the match of his favorite team. Being the topcoders that we are, we must get used to writing reliable programs. A single case which we cannot handle dooms this approach to failure.
What we generally have to do in situations like this is to analyze what might have caused the error in the first place and act accordingly to avoid it in the future. Let’s look again at the previous scenario. The dating activity clashes with both the exam and the match, while the other two only clash with the date. So, the idea almost comes from itself. Why not always select the activity that produces the minimum amount of clashes with the remaining activities? Seems logical – it all makes sense now! We’ll try to prove that this approach is indeed correct. Suppose we have already selected an activity X and try to check if we could have selected two activities A and B that clash with X instead. A and B should of course not clash, otherwise the final result will not improve. But now, we are back to the previous case (X has two clashes, while A and B have only one). If this is the case, A and B are selected from the beginning. The only way to disprove our assumption is to make A and B clash more, without affecting other activities except X. This is not very intuitive, but if we think it through we can (unfortunately) build such a case:
The activities represented by the blue lines are the optimal choice given the above schedule. But as the activity in red produces only 2 clashes, it will be chosen first. There are 4 compatible activities left before, but they all clash with each other, so we can only select one. The same happens for the activities scheduled after, leaving space for only one more choice. This only gives us 3 activities, while the optimum choice selects 4.
So far, every solution we came up with had a hidden flaw. It seems we have to deal with a devilish problem. Actually, this problem has quite an elegant and straightforward solution. If we study the figure above more carefully, we see that the blue activity on the bottom-left is the only one which finishes before the “timeline” indicated by the thin vertical bar. So, if we are to choose a single activity, choosing the one that ends first (at a time t1), will leave all the remaining time interval free for choosing other activities. If we choose any other activity instead, the remaining time interval will be shorter. This is obvious, because we will end up anyway with only one activity chosen, but at a time t2 > t1. In the first case we had available all the time span between t1 and finish and that included the time between t2 and finish. Consequently, there is no disadvantage in choosing the activity that finishes earlier. The advantage may result in the situation when we are able to insert another activity that starts between t1 and t2 and ends up before the end of any activity that starts after time t2.
Known as the “Activity Selection”, this is a standard problem that can be solved by the Greedy Method. As a greedy man takes as much as he can as often as he can, in our case we are choosing at every step the activity that finishes first and do so every time there is no activity in progress. The truth is we all make greedy decisions at some point in our life. When we go shopping or when we drive a car, we make choices that seem best for the moment. Actually, there are two basic ingredients every greedy algorithm has in common:
Greedy Choice Property: from a local optimum we can reach a global optimum, without having to reconsider the decisions already taken.
Optimal Substructure Property: the optimal solution to a problem can be determined from the optimal solutions to its subproblems.
The following pseudo code describes the optimal activity selection given by the “greedy” algorithm proven earlier:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
Let N denote the number of activities and {
I
}
the activity I(1 <= I <= N)
For each {
I
}, consider S[I] and F[I] its starting and finishing time
Sort the activities in the increasing order of their finishing time
that is,
for every I < J we must have F[I] <= F[J] < br / >
// A denotes the set of the activities that will be selected
A = {
1
}
// J denotes the last activity selected
J = 1
For I = 2 to N
// we can select activity 'I' only if the last activity
// selected has already been finished
If S[I] >= F[J]
// select activity 'I'
A = A + {
I
}
// Activity 'I' now becomes the last activity selected
J = I
Endif
Endfor
Return A
After applying the above algorithm, Johnny’s “Fun Plan” would look like this:
Eliminate all the bugs and take some time to rest
Tuesday is for chess, prepare to beat them all
A whole day of Starcraft follows, this should be fun
The next two days are for recovery
As for the final day, get a few rating points on topcoder, take a shower and enjoy the versatile food and the good quality wine
The problem of John Smith is solved, but this is just one example of what Greedy can do. A few examples of real topcoder problems will help you understand the concept better. But before moving on, you may wish to practice a little bit more what you have read so far on a problem similar with the Activity Selection, named Boxing.
BioScore
In this problem you are asked to maximize the average homology score for all the pairs in the set. As an optimal solution is required, this may be a valuable clue in determining the appropriate method we can use. Usually, this kind of problems can be solved by dynamic programming, but in many cases a Greedy strategy could also be employed.
The first thing we have to do here is to build the frequency matrix. This is an easy task as you just have to compare every pair of two sequences and count the occurrences of all the combinations of nucleic acids (AA, AC, AG, AT, CA, CC, CG, CT, GA, GC, GG, GT, TA, TC, TG, TT). Each of these combinations will be an element in the matrix and its value will represent the total number of occurrences. For example, let’s take the set { “ACTAGAGAC”, “AAAAAAAAA”, “TAGTCATAC”, “GCAGCATTC” } used in Example 2.
AA: 14 | CC: 3 | GG: 0 | TT: 1 |
---|---|---|---|
AC + CA: 11 | AG + GA: 10 | AT + TA: 10 | |
CG + GC: 2 | CT + TC: 0 | ||
GT + TG: 3 |
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
Best = -Infinity
For S[1] = 1 to 10
For S[2] = 1 to 10
For S[3] = 1 to 10
For S[4] = 1 to 10
If(S[1] + S[2] + S[3] + S[4]) mod 2 = 0
S[5] = S[6] = 10
S[7] = 10 - (S[1] + S[2] + S[3] + S[4]) / 2
S[8] = S[9] = S[10] = -10
// in Best we save the greatest average homology score
Best = max(Best, score(F, S))
// obtained so far.
Endif
Endfor
Endfor
Endfor
Endfor
Return Best
Given the score matrix (in our case the array S), we compute the final result by just making the sum of the products of the form F[I] * S[I] ( 1 <= I <=10) and divide it by N * (N-1) / 2 in order to obtain the average homology score.
GoldMine
We are now going to see how a gold mine can be exploited to its fullest, by being greedy. Whenever we notice the maximum profit is involved, a greedy switch should activate. In this case, we must allocate all the miners to the available mines, such that the total profit is maximized. After a short analysis, we realize that we want to know how much money can be earned from a mine in all the possible cases. And there are not so many cases, as in each mine we can only have between 0 and 6 workers. The table below represents the possible earnings for the two mines described in the example 0 of the problem statement:
0 workers | 1 worker | 2 workers | 3 workers | 4 workers | 5 workers | 6 workers | Worker 6 |
---|---|---|---|---|---|---|---|
First mine | 0 | 57 | 87 | 87 | 67 | 47 | 27 |
Second mine | 0 | 52 | 66 | 75 | 75 | 66 | 48 |
Initially | Worker 1 | Worker 2 | Worker 3 | Worker 4 | Worker 5 | Worker 6 | |
---|---|---|---|---|---|---|---|
First mine | - | 57 | 30 | 0 | -20 | -20 | -20 |
Second mine | - | 52 | 14 | 9 | 0 | -9 | -20 |
Gold Mine Status | Profit from extra-worker 1 | Profit from extra-worker 2 |
---|---|---|
number of ores > number of workers + 2 | 60 | 60 |
number of ores = number of workers + 2 | 60 | 50 |
number of ores = number of workers + 1 | 50 | -20 |
number of ores < number of workers + 1 | -20 | -20 |
1
2
3
4
5
6
7
8
9
10
11
Groups = 0
Repeat
// sorts the array in decreasing order
Sort(A)
Min = A[K]
If Min > 0 Groups = Groups + 1
For I = 1 to K
A[I] = A[I] - 1
Endfor
Until Min = 0
Return Groups
Unfortunately, a country can have up to a billion citizens, so we cannot afford to make only one group at a time. Theoretically, for a given set of k countries, we can make groups until all the citizens in one of these countries have been grouped. And this can be done in a single step:
1
2
3
4
5
6
7
8
9
10
11
Groups = 0
Repeat
// sorts the array in decreasing order
Sort(A)
Min = A[K]
Groups = Groups + Min
For I = 1 to K
A[I] = A[I] - Min
Endfor
Until Min = 0
Return Groups
The execution time is no longer a problem, but it is the algorithm! As we check it on the example 0, our method returns 4 instead of 5. The result returned for the examples 1, 2 and 3 is correct. As for the last example, instead of making 3983180234 groups, we are able to make 3983180207. Taking into account the small difference, we may say that our solution is pretty good, so maybe we can refine it more on this direction.
So far, we have two algorithms:
a first greedy algorithm that is accurate, but not fast enough
a second greedy algorithm that is fast, but not very accurate.
What we want to do is to optimize accuracy as much as we can, without exceeding the execution time limit. Basically, we are looking for a truce between speed and accuracy. The only difference in the two algorithms described above is the number of groups we select at a given time. The compromise we will make is to select an arbitrarily large number of groups in the beginning, and as we approach the end to start being more cautious. When we are left with just a few ungrouped citizens in every country, it makes complete sense to use the safe brute force approach. In the variable Allowance defined in the algorithm below, we control the number of groups we want to make at a given moment.
1
2
3
4
5
6
7
8
9
10
11
12
Groups = 0
Repeat
// sorts the array in decreasing order
Sort(A)
Min = A[K]
Allowance = (Min + 999) / 1000
Groups = Groups + Allowance
For I = 1 to K
A[I] = A[I] - Allowance
Endfor
Until Min = 0
Return Groups
If this approach is correct indeed, remains to be seen. Despite the fact it escaped both Tomek’s keen eyes and system tests, it is very likely that the result is not optimal for all the set of possible test cases. This was just an example to show that a carefully chosen refinement on a simple (but obvious faulty) greedy approach can actually be the “right” way. For more accurate solutions to this problem, see the Match Editorial.
Conclusion
Greedy algorithms are usually easy to think of, easy to implement and run fast. Proving their correctness may require rigorous mathematical proofs and is sometimes insidious hard. In addition, greedy algorithms are infamous for being tricky. Missing even a very small detail can be fatal. But when you have nothing else at your disposal, they may be the only salvation. With backtracking or dynamic programming you are on a relatively safe ground. With greedy instead, it is more like walking on a mined field. Everything looks fine on the surface, but the hidden part may backfire on you when you least expect. While there are some standardized problems, most of the problems solvable by this method call for heuristics. There is no general template on how to apply the greedy method to a given problem, however the problem specification might give you a good insight. Advanced mathematical concepts such as matroids may give you a recipe for proving that a class of problems can be solved with greedy, but it ultimately comes down to the keen sense and experience of the programmer. In some cases there are a lot of greedy assumptions one can make, but only few of them are correct (see the Activity Selection Problem). In other cases, a hard problem may hide an ingenious greedy shortcut, like there was the case in the last problem discussed, WorldPeace. And this is actually the whole beauty of greedy algorithms! Needless to say, they can provide excellent challenge opportunities…
A few final notes
a problem that seems extremely complicated on the surface (see TCSocks) might signal a greedy approach.
problems with a very large input size (such that a n^2 algorithm is not fast enough) are also more likely to be solved by greedy than by backtracking or dynamic programming.
despite the rigor behind them, you should look to the greedy approaches through the eyes of a detective, not with the glasses of a mathematician.
in addition, study some of the standard greedy algorithms to grasp the concept better (Fractional Knapsack Problem, Prim Algorithm, Kruskal Algorithm, Dijkstra Algorithm, Huffman Coding, Optimal Merging, Topological Sort).
Further Problems
Level 1
GroceryBagger – SRM 222
FanFailure – SRM 195
PlayGame – SRM 217
SchoolAssembly – TCO04 Round 2
RockStar – SRM 216
Apothecary – SRM 204
Boxing – TCO04 Round 3
Unblur – TCO04 Semifinal Room 3
Level 2
Crossroads – SRM 217
TCSocks – SRM 207
HeatDeath – TCO04 Round 4
BioScore – TCO04 Semifinal Room 1
Rationalization – SRM 224
Level 3
GoldMine – SRM 169
MLBRecord – TCO04 Round 2
RearrangeFurniture – SRM 220
WorldPeace – SRM 204